Skip to content

Conversation

@shuoweil
Copy link
Contributor

@shuoweil shuoweil commented Oct 4, 2025

When displaying a DataFrame containing JSON columns (including nested JSON in lists or structs), the anywidget table would fail to render and fall back to the "Computation deferred" message.

This was caused by a limitation in PyArrow (apache/arrow#45262), which raises an ArrowNotImplementedError when attempting to create an empty Arrow array from an extension type like db_dtypes.JSONArrowType. The TableWidget initialization triggers this error when creating an empty DataFrame to build the table structure before fetching data.

This commit introduces a workaround in bigframes.core.blocks.to_pandas_batches. Before creating the empty DataFrame for the widget, the code now:

  1. Recursively replaces any JSONArrowType in the schema with pyarrow.string to create a "safe" dtype.
  2. Creates the empty pandas.Series using this safe dtype.
  3. Immediately casts the empty Series back to the original, correct JSON dtype.

This approach avoids the PyArrow error while preserving the correct schema for the DataFrame.

Additionally, the existing conversion of JSON data to strings inbigframes.session.executor.py is retained to handle data correctly during processing after it is fetched from BigQuery.

Fixes #<448126500 and 453561268> 🦕

@shuoweil shuoweil self-assigned this Oct 4, 2025
@shuoweil shuoweil requested review from a team as code owners October 4, 2025 07:39
@product-auto-label product-auto-label bot added size: s Pull request size is small. api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. labels Oct 4, 2025
@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@product-auto-label product-auto-label bot added size: m Pull request size is medium. and removed size: s Pull request size is small. labels Oct 4, 2025
Copy link
Collaborator

@tswast tswast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In b/448126500 I suggest investigating the errors that happen when visualizing STRUCT columns, but this PR doesn't test such cases.

@shuoweil shuoweil force-pushed the shuowei-anywidget-col branch 2 times, most recently from 43a938c to 4a33ccf Compare October 9, 2025 06:43
@shuoweil shuoweil requested a review from tswast October 9, 2025 06:45
@shuoweil shuoweil force-pushed the shuowei-anywidget-col branch from 30dfa7d to 237c134 Compare October 15, 2025 19:45
@shuoweil shuoweil requested a review from tswast October 15, 2025 20:51
Comment on lines 786 to 788
# anywdiget mode uses the same display logic as the "deferred" mode
# for faster execution
if opts.repr_mode in ("deferred", "anywidget"):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's revert this.

@shuoweil shuoweil force-pushed the shuowei-anywidget-col branch from f18fd9e to 8601e52 Compare October 20, 2025 08:20
@product-auto-label product-auto-label bot added size: l Pull request size is large. and removed size: m Pull request size is medium. labels Oct 21, 2025
@shuoweil shuoweil force-pushed the shuowei-anywidget-col branch from 533541a to 860db3e Compare October 21, 2025 01:57
@product-auto-label product-auto-label bot added size: m Pull request size is medium. size: l Pull request size is large. and removed size: l Pull request size is large. size: m Pull request size is medium. labels Oct 21, 2025
@shuoweil shuoweil changed the title perf: Default to interactive display for SQL in anywidget mode feat: Display JSON columns in anywidget mode Oct 21, 2025
jialuoo and others added 7 commits October 30, 2025 19:38
This commit migrates the `round_op` operator from the Ibis compiler to the SQLGlot compiler.
* add error handling for audio_transcribe

* add error handling for pdf functions

* add eror handling for image functions

* final touch

* restore rename

* update notebook to better reflect our new code change

* return None on error with verbose=False for image functions

* define typing module in udf

* only use local variable

* Refactor code
* feat: support INFORMATION_SCHEMA tables in read_gbq

* avoid storage semi executor

* use faster tables for peek tests

* more tests

* fix mypy

* Update bigframes/session/_io/bigquery/read_gbq_table.py

* immediately query for information_schema tables

* Fix mypy errors and temporarily update python version

* snapshot

* snapshot again
@product-auto-label product-auto-label bot added size: xl Pull request size is extra large. and removed size: l Pull request size is large. labels Oct 30, 2025
@product-auto-label product-auto-label bot added size: l Pull request size is large. and removed size: xl Pull request size is extra large. labels Oct 30, 2025
@shuoweil shuoweil changed the title feat: Display JSON columns in anywidget mode fix: Correctly display DataFrames with JSON columns in anywidget Oct 30, 2025
@shuoweil shuoweil requested a review from chelsea-lin October 30, 2025 21:55
@product-auto-label product-auto-label bot added size: m Pull request size is medium. and removed size: l Pull request size is large. labels Oct 30, 2025
@shuoweil shuoweil changed the title fix: Correctly display DataFrames with JSON columns in anywidget fix: Handle empty DataFrames with nested JSON columns in to_pandas_batches() Oct 30, 2025
@shuoweil shuoweil changed the title fix: Handle empty DataFrames with nested JSON columns in to_pandas_batches() feat: Display JSON columns in anywidget mode fix: Correctly display DataFrames with JSON columns in anywidget Oct 30, 2025
@shuoweil shuoweil changed the title feat: Display JSON columns in anywidget mode fix: Correctly display DataFrames with JSON columns in anywidget feat: Display JSON columns in anywidget mode Oct 30, 2025
@product-auto-label product-auto-label bot added size: l Pull request size is large. and removed size: m Pull request size is medium. labels Oct 31, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. size: l Pull request size is large.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants